Active Reinforcement Learning with Monte-Carlo Tree Search
نویسندگان
چکیده
Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to BayesAdaptive MDPs. We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs. On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL. By analysing exploration behaviour in detail, we uncover obstacles to scaling up simulation-based algorithms for ARL.
منابع مشابه
Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft
Deep reinforcement learning has been successfully applied to several visual-input tasks using model-free methods. In this paper, we propose a model-based approach that combines learning a DNN-based transition model with Monte Carlo tree search to solve a block-placing task in Minecraft. Our learned transition model predicts the next frame and the rewards one step ahead given the last four frame...
متن کاملReinforcement Learning via AIXI Approximation
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the a...
متن کاملA Monte Carlo Tree Search approach to Active Malware Analysis
Active Malware Analysis (AMA) focuses on acquiring knowledge about dangerous software by executing actions that trigger a response in the malware. A key problem for AMA is to design strategies that select most informative actions for the analysis. To devise such actions, we model AMA as a stochastic game between an analyzer agent and a malware sample, and we propose a reinforcement learning alg...
متن کاملOn Monte Carlo Tree Search and Reinforcement Learning
Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our ...
متن کاملUpper Confidence Trees and Billiards for Optimal Active Learning
This paper focuses on Active Learning (AL) with bounded computational resources. AL is formalized as a finite horizon Reinforcement Learning problem, and tackled as a single-player game. An approximate optimal AL strategy based on tree-structured multi-armed bandit algorithms and billiard-based sampling is presented together with a proof of principle of the approach. Motsclés : Apprentissage ac...
متن کامل